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ABSTRACT 

For  nearly  two  decades  we  have  witnessed  an  intensive  development 
of  a  statistical  raethodology  for  assessing  length  of  life  and  relia- 
bility of  performance  frora  empirical  data.   The  initial  stimulus  for 
research  on  statistical  problems  in  life  testing  and  reliability  cane 
from  the  need  to  answer  pressing  practical  questions  which  could  not  be 
treated  by  the  existing  statistical  techniques.   Because  life  and  per- 
formance tests  are  so  time  consuroing  and  expensive  to  run,  it  is  a 
practical  necessity  to  terminate  them  as  soon  as  possible. 

For  the  statistician  this  means  developing  estimation  and  decision 
procedure  for  data,  v;hich  are  severely  curtailed  in  one  way  or  another 
long  before  all  items  on  test  have  actually  failed.  The 
estimation  is  more  complicated  when  the  data  are  truncated,  i.e.  when 
the  observer  loses  track  of  some  individuals  before  death  occur.   The 
product  limit  method  of  Kaplan  and  Meier  is  one  way  of  estimating  p(t) 
when  the  mechanism  causing  truncation  is  independent  of  the  mechanism 
causing  death. 

This  paper  proposes  alternative  estimators  and  compares  them  to 
the  product  limit  method.   A  computer  simulation  is  used  to  generate 
the  times  of  death  and  truncation  from  a  variety  of  assumed  distribu- 
tions.  No  single  estimator  gives  the  best  fit  to  the  "true"  distribu- 
tion of  death  under  all  situations.   However,  other  estimators  are 
shown  to  be  better  than  the  product  limit  estimator  under  all  of  the 
assumed  situations. 
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I.   INTRODUCTION 

Let  the  random  variable  T  denote  the  time  that  elapses  until  an 
event  occurs;  the  event  may  for  example  be  an  equipment  failure,  an 
individual's  death,  or  the  detection  of  a  target.   Denote  by  p{t)   the 
probability  of  survival  to  time  t, 

p{T  >  t}  =  p(t) 

Picturesquely,   T  is  called  a  lifetime,  and  p(t)   is  a  survival  pro- 
bability; 

F(t)  =  1  -  p(t)   is  the  distribution  function  of  T. 

In  the  medical  field,  one  might  wish  to  estimate  the  probability, 
p{t)   that  a  patient  survives   t  after  a  certain  surgical  procedure 
for  cancer.   In  electronics,  one  wishes  to  estimate  the  probability  of 
continuous  failure-free  operation  of  an  equipment  for  time  t.   In  the 
military,  one  might  be  interested  in  the  probability  of  conducting  a 
certain  mission,  under  specified  environmental  conditions,  without  de- 
tection by  the  enemy.   The  event  of  interest  may  be  a  human  death, 
equipment  malfunctions,  or  sonar  detection.   Following  Kaplan  and  Meier, 
Reference  (1) ,  this  paper  will  refer  to  the  event  of  interest  as  a 
"death".   The  test  element  in  the  sample  may  be  a  human,  a  radio,  or 
a  submarine.   This  paper  will  refer  to  the  test  elements  as  "individuals", 

Suppose  that  observed  values  of  T  are  t^ ,  t„,  t_,...t  ,  so  that  N 

i    2    J      N 

lifetimes  are  observed.   In  this  case  an  appropriate  (unbiased)  esti- 
mates of  survival  to  time  t  is 


NO! 

Sir. 


number  of  t. 's  >  t 

J(t)  = 

U 


Under  many  circumstances  complete  lifetimes  are  not  observed;  censoring 
occurs  at  certains,  x.  ,   beyond  which  the  life  of  an  individual  is  not 
known.   In  such  cases  construction  of  an  appropriate  estimate  of  the 
survival  probability  is  more  difficult.   In  this  paper  various  estimates 
of  survival  probability  are  studied  when  lifetimes  are  randomly  censored. 
This  means  that  censoring  times  are  assumed  to  be  realizations  random 
variables  independent  of  the  actual  lifetimes. 

The  product-limit  estimator  of  Kaplan  and  Meier,  Reference  (1) ,  is 
an  accepted  method  of  dealing  with  the  problem  of  censored  data.   This 
paper  presents  thirteen  non-parametric  estimators,  including  the  product 
limit  function.   Censored  data  sets  are  simulated.   The  thirteen  esti- 
mators are  compared  by  examining  their  performance  on  the  simulated  data 
bases. 


II.   THEORY 

Ihere  are  two  approaches  to  the  empirical  estimation  of  the  survival 
probability,  p(t): 

(1)  one  may  use  the  observed  fraction  of  survivors  at  arbitrarily 
selected  times  (step  function  estimator) ,  or 

(2)  one  may  focus  attention  on  the  times  of  the  observed  deaths 
(point  estimator) . 

The  initial  discussion  is  based  on  the  assumption  that  all  observa- 
tions are  complete,  i.e.,  it  is  assumed  that  all  individuals  remain  under 
observation  until  their  time  of  death.   This  initial  assumption  is  for 
the  purpose  of  simplifying  the  discussion.   Then,  later  in  this  paper, 
the  discussion  is  broadened  to  include  incomplete  data  with  observations 
of  both  death  and  censoring  events. 
Survival  Probabilities;  No  Censoring 

Let  0=t   <t,  <t^  ...  <t.  <t.  ,  <...  be  a  sequence  of  fixed 
0    12        1    1+1  ^ 

times.   Then  if  T  is  a  lifetime 


p(t.)  =  p{T  >  t.  } 
1  1 


and  denote  the  conditional  probability  of  survival  to  time  t. ,  given 

survival  to  t.  ,  by 
i+l 


p(t.  |t.  ,)  =  p{T  >  t.  |t  >  t.   > 

^   1 '  1-1           1 '  1-1 

p{T  >  t. }  p(t. ) 
1  _     1 


(1) 


If  p(t.  J  =  0  ,  define  p(t.  t.  J  =  0  . 
1-1  i'  1-1 

Then 


p(t,)  =p(tjt._^)p(t._^)  =P(tJt._^).p(t._Jt._2)p(t._2) 


Trp(tjitj_i)  (2) 


where  p(t.jt  )  =  p(t.)  ;   p(o)  =  1  . 

Observations  on  Uncensored  Data  at  Fixed  Times 

Let  a  sample  of  N   individuals  come  under  observation.   They  are 
all  observed  from  birth  (or  the  appropriate  event  defining  tirae  zero) 
until  death.   VJith  the  first  approach,  preselects  a  series  of  times, 
0  <  t  <  t  <  .,.  before  examining  the  observed  time  of  death.   In  the 
medical  follow-up  example,  one  might  select  the  times  corresponding  to 
exactly  1,2,3,...  years  after  a  surgical  procedure  for  cancer.   An  esti- 
mate of  the  conditional  probability  of  survival  to  t.,  given  survival 


to  t.  ^  is 
1-1 


II.  -  r. 


With  N.   elements  were  present  at  the  beginning  of  the  interval,  i.e., 

at  time  t.  ,,  and  r.  elements  failed  during  the  interval. 
1-1       1 

For  a  set  of  data  which  is  not  censored,  N.  =  N.  ,  -  r.  .  .   Now 

1    1-1    1-1 

replace  probabilities  by  their  estimates  in  (2)  : 
^  ^   N.  -  r. 


p(t.)  =lTp(t.|t._^)  =-xj-  (-V-^> 

j=l  j=l      ^ 


i^.< 


N-r,   N-r  -r        N-r -r .  ^   N-r  -...-r. 

=  ( -){ ^^-^)  —  ( i^)( i i— ) 

1  1      1-2     1      1-1 


1 
r 


=  1  - 


N 


Now  the  estimate  p(t.)   is  of  the  form 

N-(r  +  r  +  ...  +  r.  ) 
P(t,)  =     -"    ^  "■ 


i  N 


and  this  is  the  same  as 


^'^i'  'r 


where  S.   is  the  number  of  the  original  group,  of  size  N,  that  survive 

to  t..   If  it  is  assumed  that  the  N  individuals  each  have  the  survival 

1 

probability  p(t),  and  that  they  die  independently,  then  S.,  the  random 
number  that  survive  to  time  t.  is  binomially  distributed,  with  S.  being 
a  realized  value  of  S . .   Then,  considering  the  estimate  as  a  random 
variable. 


and 


and 


N  p(t.) 
E[5(t.)]  =        ^      "■      =   p(t.) 


p(t.)(l-p(t  ) 
Var[p(t.)]  =    ^ 


1  N 


Consequently  p(t.)   is  an  unbiased  and  consistent  estimate  of  p(t.). 

This  is  true  for  every  t.,  and  can  be  shown  to  be  true  for  all  t.,  i=l, 

1  1 


2, . .  .1,  as  willo 


10 


5T- 


All  of  this  indicates  that  the  estimate  suggested  is  likely  to  be  a 

good  one  if  the  sample  size,  N,  is  large. 

Clearly  p(t^)  lP(t^_j^)  .   The  survival  probability,  p(t),  is  thus 

estimated  at  a  fixed  sequency  of  times.   At  each  time  point,  t.  being  a 

typical  one,  there  are  r   fewer  survivors  than  at  t.  ,,  where  r.  =  0,1, 

1  1-1         1     '  ' 

2,...,H.   Consequently  a  plot  of  p(t.)  shows  a  non-decreasing  step 
function,  with  downward  steps  of  varying  sizes  at  t  ,t  ,...  . 

If  the  above  times  are  close  together,  and  if  the  time  of  death  T, 
has  a  density  function,  then  one  can  anticipate  seeing  values  of  r.  that 
are  either  zero  or  unity. 

The  so-called  second  approach  is  really  a  limiting  case  of  the  first, 
as  the  time  of  intervals  of  measurement  decrease  indefinitely.  Thus  when 
a  death  (or  loss)  occurs  it  is  only  a  single  event. 

When  no  losses  take  place,  the  case  now  considered,  the  time  t.  of 
the  ith  death  is  a  really  a  realization  of  a  random  variable,  denoted  by 
t.;  this  means  that  p(_t. )  the  probability  of  surviving  t_.  ,  is  a  random 
variable.   It  can  be  shown  that  the  expected  value  of  p(Jb. )  is 

r  /   » ,    n-i+l     ,  ^ 
E[p(t^)]  =  ^^^-^   ,  1=1, 2,..., N 

where   t,  <  t^  < <  t„  . 

—1    —2         — N 

The  derivation  involves  integrating 


oc 


,  ,   >  -.      I     ,    .  N!       „   ,  ,  ,  i-1 ,   dp  (t)  ,  r  , ,  ^  -iN-i 

E[p(t,)]  =   /    P(t)'(._.^),(^..^3_),  [I-P(t)]    {-^-^)[p{t)] 


o 

=  N-i+1 
N+1 

by  transformation  from  p(t)  to  x;  see  Cramer,  Mathematical  Methods  of 

Statistics,  H.  Cramer,  Princeton  University  Press,  1946. 


11 


Thus  one  is  led  to  use 


p<'i>  =  ^ 


as  an  estimate  of  the  value  of  p(t.),  t.  being  the  ith  time  of  death. 

Expression  (4)  provides  estimator  of  the  survival  function  at  times  of 

obseirved  deaths  when  there  are  no  losses  because  of  censoring.    The 

estimator  at  the  points  t. :  t.  <  t.  <  <  t  .  can  be  connected  by 

1   1    2  N 

straight  lines,  or  a  step  function  with  step  sizes  1/(N+1)  may  be  used. 
The  estimators  of  equation  (4)  give  intuitively  acceptable  results. 
For  example,  if  the  sample  consists  of  only  a  single  individual  {N=l) , 
then  death  is  equally  likely  to  occur  before  or  after  the  time  at  which 
the  true  (but  unknov/n)  survival  function  equals  one  half.   Thus,  the 
result  of  equation  (4)  is  reasonable: 

E[;(t^)]  =i 

The  point  estimates  of  the  second  approach  always  occur  at  the  times  of 
discontinuity  forestimates  from  the  first  approach.   For  example,  con- 
sider a  data  base  (N=4)  with  deaths  observed  at  times  1,3,4  and  7.   The 
first  approach  gives  the  following  step  function  estimate  of  the  survival 
function: 

1.0        0  £  t  <  1 

0.75       1  <  t  <  3 
p{t)   =   <    0.5        3  <_  t  <  4 

0.25       4  <  t  <  7 

0.0        t  <  7 
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The  second  approach  gives  the  following  point  estimates 

p(o)  =  1.0 

p(l)  =  0.8 

p(3)  =  0.6 

p(4)  =  0.4 

p(7)  =  0.2 
A  graphic  comparison  of  the  results  of  the  two  approaches  is  given 
below: 


1.0 

p(t) 

I 

1 

0.5 

^ 

V 

1 4 1 

V- 

\ 

0 

1 1 

0    1         3         5         7       t 

It  is  difficult  to  decide  how  to  smooth  out  the  step  functions 
that  result  from  the  first  approach.  By  connecting  the  tops  of  the 
"stairsteps, "one  places  an  upper  bound  on  reasonable  estimates.  By 
connecting  the  bottom  corners  of  the  stairsteps,  one  places  a  lower 
bound  on  reasonable  estimates.  One  might  draw  a  smooth,  decreasing 
curve  that  passes  through  all  (or  almost  all)  of  the  vertical  faces 
of  the  step-function  estimate.  The  second  approach  suggests  method 
of  selecting  a  unique  point  on  each  of  these  vertical  segments. 
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Incomplete  observations 

When  some  of  the  observations  are  incomplete,  equation 
(4)  requires  modification.   The  expected  value  of  the  survival  function 
at  the  time  of  the  first  observed  death  may  be  written: 

^1 

Here  N   is  tlie  effective  size  of  the  sample  during  the  interval  termi- 
nated by  the  time  observed  for  the  first  death  (o,t,).   In  the  special 
case  of  no  censoring  events,  the  value  of  N,  is  unambiguous.   It  is 
equal  to  the  initial  sample  size  (N,  =  N) .   In  this  case  equation  (5) 
reduces  to  equation  (4)  . 

Subsequent  point  estimates  for  t^,  t.,...  may  be  calculated 
iteratively: 

1 

where  t  =  0  and  N.  is  the  effective  sample  size  over  the  time  interval 
o  1  ^ 

(t^_^,  t^) .   Thus, 

ELP(t,)]  =  fr  i^)  (6) 

D=l    3 

Variance  of  the  estimators 

Kaplan  and  Meier,  reference  (1) ,  give  an  expression  for  the  exact 
calculation  of  the  variance  of  step  functions.   They  also  discuss 

"Greenwood's  formula,"  a  large  sample  approximation  that  ignores  terms 

2 

of  order  1/N.  . 
1 
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Herd,  reference  (2) ,  presence  without  derivation  an  expression  for 
the  variance  of  estimates  using  the  second  approach  (point  estimators) ; 


i  1 

V(t.)  =  Var  {E[;(t.)]}  =  77  (^)  -  Tf  (-^l^)  ^ 

3=1    D       D=l    3 


The  notation  here  follows  that  for  the  estimating  equation  (6) 
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III.   THE  ESTIMATORS 

This  section  describes  the  nine  non-parametric  estimators  and  four 
jackknife  estimators  of  the  survival  probability.   It  also  describes 
the  parametric  estimator  for  an  exponential  decay  function.   Exponential 
life  distributions  are  the  starting  point  for  much  of  reliability  theory 
and  practice.   The  estimator  derived  from  the  exponential  is  regarded  as 
"par"  when  the  simulated  data  is  based  on  an  underlying  exponential  decay 
distribution  for  deaths.   Thus,  when  deaths  are  exponentially  distributed, 
the  non-parametric  estimators  may  be  compared  relative  to  each  other, 
and  they  may  be  compared  with  the  parametric  estimator  as  a  standard. 

A  hypothetical  data  base,  consisting  of  five  individuals,  is  used 
to  illustrate  each  of  the  estimators.   This  sample  data  base  is  as 
follows : 


Individual 
A 
B 
C 
D 
E 


Time  of  Death 

1 
Unknown  ( >  2 ) 

3 
Unknown  (>6) 

7 


Time  of  Truncation 


The  data  have  been  arranged  in  time  sequence  of  the  death  and  trunca- 
tion events.   In  the  medical  example,  the  data  might  indicate  that 
patients  A,  C  and  E  were  observed  to  die  exactly  1,  3  and  7  years,  re- 
spectively, after  their  surgery.  However,  B  and  D  moved  away  or  other- 
wise became  unavailable  to  the  observer  at  these  times.   Further,  the 


cause  of  the  unobservability  is  unrelated  to  the  patient's  health  and 
life  expectancy. 

A.   STEP-FUNCTION  ESTIMATORS 

1.   The  First  Estimator^  "p  (t)" 

p  (t)  is  a  naive  estimator;  it  is  expected  to  perfoirm  poorly 
relative  to  the  other  estimators,   p  only  depends  on  the  data  from  in- 
dividuals whose  deaths  are  observed.   It  ignores  any  information  from 
the  partial  lifetimes  noted  for  the  censored  observations,   p  (t)  is 
simply  the  fraction  of  individuals  surviving  to  at  least  time  t  among 
those  individuals  whose  time  of  death  is  known.   It  is  a  step  function: 


t 

Pl(t) 

0-1 

1.0 

1-3 

0.667 

3-7 

0.333 

7_oo 

0.00 

The  naive  estimator,  p  (t) ,  takes  no  account  of  the  successful 
survival  intervals  observed  for  the  censored   individuals.   Therefore 
it  is  biased  in  a  downward  (pessimistic)  direction. 
2.   The  Second  Estimator,  "p„{t)" 

p  (t)  is  the  product-limit  estimate^   Kaplan  and  Meier,  refer- 
ence (1) ,  have  shown  that  this  is  the  maximum  likelihood  estimator.   The 
observed  events,  both  deaths  and  truncations,  are  arranged  in  increasing 
order  of  occurrence:   t  ,  t  , ...,t  ;  where  N  is  the  number  of  individuals 
in  the  sample. 

Let  p(t.)  denote  the  cumulative  probability  of  survival  of  an 
individual  from  time  zero  to  time  t..   Let  p(t|t.)  denote  the  conditional 
probability  of  surviving  to  time  t (>  t.),  given  that  the  individual  has 
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already  survived  to  time  t. .   Then, 


P2(t.)  =P2(t,_^)  .  P2(tjt._^) 


(E-1) 


If  we  define  t  =  0  and  p(0)  =1,  then 


P2(tj  =  TrP2(t  It  ,] 

-1=1 


(E-2) 


The  product  limit  estimator  is  in  the  form  of  equation  (E-2)  with 


^=1 


If  the  event  at  t .  is 
truncation       -^ 


Po(t.  t .  .)= 
2   :'  3-1 


N.-l 
_J 


(E-3) 


If  the  event  at  t.  is 
a  death  -" 


Here  n .  is  the  number  of  individuals  observed  surviving  in  the  interval 
t.  T  <  t  <  t..   This  formulation  causes  the  product  limit  estimator  to 
be  insensitive  to  the  exact  time  of  the  censoring  events. 

The  estimator  is  unity  from  time  zero  to  the  time  of  the  first 
event,  t  ,  reflecting  the  fact  tiiat  all  individuals  in  our  exaraple  are 
observed  to  live  until  at  least  time  t  . 

-   If  the  event  at  time  t  is  a  truncation,  then  the  estimator 

remains  at  unity  until  at  least  time  t  .   Again,  no  deaths 

are  observed  in  the  sample  before  t  . 

If  the  event  at  time  t  is  a  death,  then  the  estimator  drops 

to  (N-l)/n.  This  drop  reflects  the  observed  death  of  1/N  of 

the  survival  sample  just  prior  to  t  . 
Values  of  the  estimator  p„  are  calculated  iterative ly  at  successive 
values  of  t.  (i=l,2, . . .  ,'^  )  . 
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The  size  of  the  survival  sample  declines  as  truncations  and 
deaths  remove  individuals  from  observation.   For  the  hypothetical  data 
base  listed  above,  one  obtains: 

t  ^2^^^ 


0-1  5/5   =1.0 

1-2  4/5   =  0.8 

2-3  (4/5)   X  (3/3)  =  0.8 

3-6  (4/5)   X  (2/3)  =  0.533 

6-7  (8/15)  X  (1/1)  =  0.533 

7-°°  (8/15)  X  (0/1)  =  0.0 

The  product-limit  estimator  explicitly  accounts  for  the  sur- 
vival of  these  individuals  (up  to  the  time  of  the  last  death  before 
each  censoring  event).   Thus,  p_(t)  is  a  step  function  with  a  value 
that  is  not  less  than  p  (t)  for  any  value  of  t.   If  the  sample  contains 
no  censoring,   then  p  (t)  and  p^(t)  are  identical. 

If  the  last  event  in  the  sample  is  a  truncation  rather  than  a 
death/ then  tlie  modified  data  give  the  following  estimate,  i.e., 
individual  E  had  disappeared  from  the  observer  at  time  6.5  (so  that 

the  fact  of  E's  death  at  time  7  is  unknown) . 

p^(t)  -  Modified  data 
t  ^2 


0-1 

1.0 

1-3 

0.8 

3-6.5 

0.533 

Since  tlie  time  of  the  death  for  individual  E  is  now  unknown, 
one  can  only  estimate  that: 

0  <_  P-(t)  £  0.533   for   t  >  6.5 
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If  the  analyst  is  willing  to  assiame  a  functional  form  for  the 
survival  function,  then  he  may  calculate  the  manner  in  which  the 
estimator  p  (t)  decreases  to  zero.   However,  the  data  alone  are  insuf- 
ficient when  a  strictly  non-parametric  estimator  is  used. 

The  product-limit  estimator  is  a  useful  and  intuitively  appeal- 
ing method  of  dealing  with  incomplete  observations.   It  has  been  wider 
used  and  studied.   However,  the  product-limit  has  one  disturbing 
characteristic : 

Most  of  the  biological,  physical  or  other  causes  of  deaths  pro- 
duce a  survival  probability  that  continuously  decreases  in  time. 
It  is,  therefore,  one  may  be  a  little  uncomfortable  estimating 
the  survival  probability  with  a  step  function.   One  is  tempted 
to  smooth  the  estimator  to  make  it  a  monotonic  decreasing  func- 
tion of  t. 
3.   The  Third  Estimator,  "p  (t)" 

p  (t)  is  a  modification  of  p  (t) .   Like  p  (t) ,  it  is  a  step 
function  with  discrete  drops  at  those  times  corresponding  to  the  observed 
deaths  in  the  sample  population.   It  may  also  be  expressed  as  a  product 
of  conditional  probabilities: 

D  =  l 

where  the  t  are  the  times  of  observed  deaths  and  t  is  zeroo   The  con- 
k  o 

ditional  probabilities  on  the  right-hand  side  of  Equation  (E-4)  differ 
somewhat  from  those  in  Equation  (E-2) : 
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Equation  (E-5)  differs  from  Equation  (E-3)  in  the  interpreta- 
tion of  the  numbers  of  individuals  at  risk.   Here,  the  value  of  N  is 

k 

taken  to  be  the  average  number  of  individuals  observed  surviving  in 
the  interval  between  the  (k-l)st_  observed  death  and  the  kth  observed 
death.   The  number  of  observed  survivors  decrease  at  intermediate  times 
if  events  are  censored,  and  hence  the  N  are  not  necessary  integers. 
The  value  of  N  is  regarded  as  the  effective  sample  size  for 
the  interval  from  t^   to  t  .   In  the  sample  data  base  shown  above, 
individual  B  is  known  to  have  survived  from  time  1  to  time  2,  or  half 
of  the  interval  between  the  first  death  at  t=l  and  the  second  death  at 
t=3.   Therefore,  the  estimator  p  treats  individual  B  as  half  a  parti- 
cipant in  the  interval  between  the  death  of  individuals  A  and  C. 

The  effective  sample  size  for  this  inte2rval  is  then  3.5 

(2-1) 
(n  =  3  +  ■.      '    =  3.5)  (full  contributions  from  individuals  C,  D  and  E, 

plus  a  half  contribution  from  B) .   For  our  hypothetical  data  base,  the 

following  values  are  calculated  for  p-: 

53(t) 


0-1  5/5   =1.0 

1-3  4.5    X  1»0   =  0.8 

3-7  (2.5/3.5)    X  0.8   =  0.571 

(7)  (1.75/2.75)  X  0.571  =  0.364 

The  value  of  p^(t)  can  never  be  less  than  the  corresponding  value  of 
p  (t)  .  In  the  special  case  with  no  censoring  events  the  estimators 
p  (t) ,  P2(t)  and  P2(t)  are  identical. 

One  might  perturb  the  data  by  shifting  the  time  of  B's  trunca- 
tion event  down  to  1+e  or  up  to  3-£,  £  arbitrarily  small.   The  depend- 
ence of  the  estimator  p  upon  the  exact  time  of  the  censoring  events 
may  now  be  demonstrated. 
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For  purposes  of  illustration,  the  time  of  the  censoring  event 
for  individual  B(t  )  is  decreased  from  2  to  1.1,  then  increased  to  2.9. 


t 

P3(t),    t^   = 
1.0 

2 

P3 

(t),    t^   = 
1.0 

1. 

.1 

P3 

(t),    t^   =   2.9 

0-1 

1.0 

1-3 

0.80 

0.80 

0.80 

3-7 

0.571 

0.538 

0.597 

(7) 

0.364 

0.342 

0.380 

This  example  demonstrates  an  intuitively  appealing  characteris- 
tic of  the  estimator,  p  .   As  the  total  observed  survival  time  increases 
for  the  individuals  in  our  sample  (with  deaths  held  constant) ,  the  value 
of  the  estimating  function  increases  over  at  least  a  portion  of  its 
range . 

We  may  safely  assume  that  the  true  survival  function  eventually 
tends  to  zero  with  time,  since  no  physical  or  biological  system  lives 
forever.   However,  there  are  no  observations  on  the  survival  of  indi- 
viduals beyond  time  7.   The  data  only  indicate  that  our  step- function 
estimator  drops  to  a  value  of  .364  at  t=7,  but  the  nonparametric  esti- 
mator gives  no  information  about  the  survival  function's  subsequent 
decline  from  .364  to  zero.   However,  the  data  alone  are  insufficient 
when  a  strictly  nonparametric  estimator  is  used. 

B.   POINT  ESTIMATOR 

As  mentioned  above,  the  estimators  p  ,  p  and  p  are  somewhat  unde- 
sirable because  they  give  step-function  estimates  for  a  continuous 
survival  function.   The  next  three  estimators  p  ,  p  and  p  are  modifi- 
cation of  the  first  three.   Again  they  provide  estimates  of  the  survival 
function  only  at  those  points  in  time  that  corresponde  to  observed  deaths. 
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These  estimators  are  specified  by  Equations  {E-2)  and  (E-4) ,  except  for 
a  siibstitution  of  the  term  (N+1)  in  place  of  (N)  o 

Since  the  point  estimators  have  rigorous  definitions  at  only  dis- 
crete points  in  time,  it  is  necessary  to  offer  an  interpolation  rule. 
That  is,  we  need  a  method  of  "connecting  the  dots."   The  method  proposed 
here  is  to  assume  that  the  survival  function  declines  in  a  piece-wise 
exponential  decay  between  the  discrete  points  in  time.   This  procedure 
is  equivalent  to  assuming  that  the  hazard  function  is  essentially  con- 
stant between  a  consecutive  pair  of  the  discrete  times,  but  that  the 
hazard  varies  from  one  time  period  to  the  next.   Such  an  assumption  is 
intuitively  acceptable  unless  one  suspects  violent  fluctuations  in  the 
hazard  function o 

1.   The  Estimator,  "p.  (t)" 

p  (t)  is  analogous  to  p  (t)  in  that  only  those  individuals  ob- 
served to  die  are  included  in  the  sample.   These  two  estimators  are  naive 
because  they  suppress  all  data  from  the  survival  times  of  individuals 
teirminated  from  observation  by  censoring. 

These  estimates,  i.e.,  p,  (t)  andp.(t),  tend  to  ignore  informa- 
tion from  the  more  long-lived  individuals  in  the  sample,  and  they  may 
be  expected  to  give  biased  estimates  of  the  survival  function. 

The  point  estimator  p,(t)  gives  the  following  values  with  sample 
data  base  presented  earlier  in  this  section. 


t 

P4(t) 

Interpolation 

t 

^4^^) 

0 

1 

3 

7 

1.0 

3/4   =    0.75 
(y)    X   0.75   =   0.5 

(|)    X   0.5      =   0.25 

0-1 

1-3 
3-7 

filn(3/4) 
e 

,        -^  •lln(2/3) 
(|)e 

^  •]ln(l/2) 
(i)e 
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The  interpolation  for  connecting  the  dots  are  as  follows; 


P(t) 


0.5 


\ 


^ 


\ 


-^\ 


0    1 


—^>- 


t 

0  <_  t  <_  1 

1   <_  t  ^  3 

3   <    t  <    7 

p(t) 

e-t/T 

t-1 
p(t-l)    e        '^ 

t-3 
p(t-l)    e        ^ 

interpo- 
lation 

t     =      1 

1 

3  T 

4  =   ^ 

i=-£n(|) 

t  =    3 

2 

1  3           T 

2  =   4^ 

1        -^-# 
T             2 

t  =   7 

4 
1        1          T 
4   =   2^ 

1        -^-'1' 

T               4 

P(t) 

t-£n(|) 
e 

,3,       2      -^"(3^ 

t-3       n      ,1, 

1        4      '^^^2^ 
(2)e 

2.   The  Estimator,  "p_(t)" 

D 

The  estimator  p_ (t)  similarly  corresponds  to  the  product-limit 
5 

estimator  p  (t) .   These  two  estimators  use  information  from  the  indi- 
viduals on  whom  there  are  censored   observations,   p  ,  like  p  ,  does 
not  exploit  information  about  that  portion  of  the  censored  observation 
after  the  death  event  (of  some  other  individual)  preceding  the  censor- 
ing event. 
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For  our  hypothetical  data  base  the  following  values  are 


calculated  for  p_(t): 
5 


t 

PsCt) 

Interpolation 

t 

PgCt) 

0 

1 

3 
7 

1.0 

5/6  =  0.833 

(■|)x  0.833  =  0.625 
(i)x  0.625  =  0.312 

0-1 

1-3 
3-7 

f£n(t/6) 
e 

(^)e 
(Q)e 

llThenever  censored   observations  are  present,  the  estimator  p.  (t)  never 
exceeds  P- (t) . 

For  p^{t),  the  value  of  N.  is  taken  to  be  the  number  of  surviv- 
ing individuals  in  the  sample  just  before  the  observation  of  the  ith 
death.   This  value  is  smaller  than  the  number  of  surviving  individuals 
just  after  the  (i-l)st  death  if  any  truncation  events  occur  in  the 
interval.   In  fact,  N.  is  the  smallest  number  of  surviving  individuals 
observed  at  any  time  during  the  interval  (t._  ,  t.).   Thus  p  might  be 
expected  to  introduce  a  bias  by  using  values  of  H.  that  are,  on  the 
average,  too  small.  However,  this  bias  would  be  much  less  severe  than 
the  bias  anticipated  for  the  estimator  p  (t) . 

The  estimators  p^,  and  p^  are  insensitive  to  the  precise  times 
4      5 

of  the  censoring  events.  A  change  in  the  time  of  the  censoring  event 
for  individual  B  to  l+£  to  3-e,  £  arbitrarily  small,  does  not  alter  the 
estimates  from  p.  and  p  given  in  the  preceding  paragraph. 
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3.   The  Estimator,  "p^(t)" 
6 

The  estimator  Pg(t)  corresponds  to  p  (t)  by  accounting  for  all 
of  the  survival  time  for  the  truncated  observations.   For  our  hypo- 
thetical data  base,  the  following  values  are  calculated  for  p  (t) : 

6 


t 

i,(t) 

Interpolation 

t 

P,(t) 

0 

1 

3 
7 

1.0 

5/6  =  0.833 
(^^)  X  0.833  =  0.648 

^FtI^  X  0.648  =  0.412 

0-1 

1-3 
3-7 

f£n(5/6) 
e 

t-1  „   ,3.5, 
,    2   ^"  U.5^ 
(f)e 

t-3  „   ,1.75, 
4   ^^  ^2.75^ 
(0.648)e 

The  estimator  p^ (t)  is  based  on  the  average  number  of  surviving 
5 

individuals  noted  in  the  various  time  intervals.   These  estimators  give 
part  credit  for  individuals  whose  lifetime  is  censored   in  mid-interval. 
The  value  of  N.  for  p^(t)  is  an  unweighted  time  average.   If  the  obser- 
vation of  an  individual  is  truncated  after  23%  of  the  interval  has 
elapsed,  then  that  individual  contributes  a  value  of  0.23  to  N..  Indivi- 
duals who  are  observed  to  survive  the  entire  interval,  and  the  individual 
whose  death  terminates  the  interval  each  contribute  a  value  of  1.0  to  N.. 
This  interpretation  of  the  effective  sample  size  is  approximate  if  the 
hazard  is  approximately  constant  over  the  interval.   If  the  hazard  function 
changes  markedly  within  a  time  interval  containing  censored  events,  then 
this  interpretation  of  the  effective  sample  size  is  biased o   Therefore, 
the  procedure  of  determining  the  value  of  N.  for  the  estimator  Pg(t)  is 
based  on  the  implicit  assumption  that  the  survival  function  is  locally 
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exponential o  If  the  hazard  function  may  be  assumed  to  vary  slowly  over 
each  of  the  time  intervals  (t^_^,  t^)  then  p  would  appear  to  be  biased 
on  an  acceptable  approximation. 

The  estimator  p  ,  like  p  ,  depends  on  the  precise  times  of  all 
deaths  and  censoring  events. 

^  Pfi(t),  t  =  2  P^(t),  t„  =  1.1         P^(t),  t^  =  2.9 


b'    "    ^2    "  ^6'""  -2  "  "•"         ^6'"''  "2 


0  1.0  1.0  1.0 

1  5/6  =  0.833  5/6  =  0.833  5/6  =  0.833 

3   (^)  X  0.833  =  0.648.  {~^)    x  0.833  =  0.628.  (44t)  x  0.833  =  0.665 
4.D  4.0b  4.95 

7   (j^)x  0.648  =  0.412.  (j^)  x  0.628  =  0.399.  (j^^)  x  0.665  =  0.423 

This  illustrates  that  an  increase  (or  decrease)  in  the  total  observed 
survival  time  causes  an  increase  (or  decrease)  in  the  estimate  p^  over 
at  least  some  of  its  time  range. 

If  the  last  event  is  a  censored,  and  not  an  observed,  death, 
these  estimators  also  require  definition  for  the  time  period  starting 
with  the  time  of  the  last  death  and  ending  with  the  time  of  the  final 
censoring  event. 

The  method  proposed  here  for  p. (t)  and  p_(t)  is  to  continue  the 
exponential  function  used  in  the  interval  terminated  by  the  time  of  the 
last  death.   This  procedure  can  be  illustrated  with  the  modified  data 
base  used  above  in  the  discussion  of  p  and  p  . 

C.   THE  BAYESIAN  ESTI^^ATORS 

Consideration  is  next  given  to  quasi-Bayesian  estimators  based  on 

a  uniform  prior  distribution  on  the  unit  interval.   Let  X  ,...,X  be  the 

In 

true  survival  times  of  2J  individuals  which  are  censored  on  the  right  by 
N  follow-up  times  Y  ,...,Y„.   It  is  assumed  that  the  X.  are  independent, 
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identically  distributed  random  variables  with  common  distribution  p(t) 
and  we  v/ish  to  estimate  the  suirvival  function 

p(t)  =  Pr(x  >  t) 

However,  we  only  have  available  the  data. 


:.  =  min  {x.  ,  Y.  } 
1         11 


1   if  X.  <  Y. 
1—1 

5.  = 

1 


0   if  X.  >  Y. ,  i=l,...,n 
11 


If  6.  =0,  then  Z.  is  called  "a  loss",  and  if 
1  1 

6.  =1,  then  Z.  is  called  "a  death". 
Then  p  [6.  =1]  =  p  [X.  >  t]  =  p(t),  i=l,...,N. 
The  maximum  likelihood  estimator  for  p(t)  is 


N 
p{t)  =  —    where   s  =  E   6. 
"  i=l   ^ 


is  the  number  of  successful  tests,  s  has  the  binomial  distribution. 

P(s|p)  =  (.^)  p^(l-p)'^"^,  s=0,l,...,N,  0  <  p  <  1 
f  (p)  =  1,  0  <  p  <  1 

XT 

The  joint  density  of  s  and  p  is 


f    (s,p)    N,   s,-   ,n-s   ^  ^    ^  -     ^  , 
s,p  ^     =  (  )  p  (1-p)    ,  0  <  p  <  1,  s=0,l,... 


N, 


The  marginal  for  s  is 


^■^/^^   S/1   ^n-s^     .N.   s!  (N-s)  !     1 
P3(s)  =   /  ^^)  p  (1-p)    dp  =  (N).  -^^j^  =  ^:;:^ 

o 
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for  s=0,l,...N.   Thus,  averaging  over  the  values  of  p,  all  of  which  are 
assxamed  to  be  equally  likely,,  ''^he  values  of  s  are  equally  likely  to  occur. 
The  posterior  for  p  then  is 

^p|s^^'^^  =r(s+i)r(N-s+i)  P  (i-P^      ,o<p<i, 

a  beta  density  with  parameters  s+1  and  N-s+1.   The  mean  of  the  posterior 
is  (s+1)  I  (N+2)  and  the  modal  (maximum  value)  of  the  posterior  is  s/tJ ;  thus 
the  Bayes  estimate   of  p  (given  s  survivers  occur  in  the  sample  of  ^^  )  is 

*    s+1 


Then,  equation  (C-1)  yields  a  step  function  and  also  has  shown  that  the 
uniform  prior  has  the  effect  of  adding  two  individuals  to  the  popula- 
tion at  risk  with  one  dying  at  time  zero  and  the  other  essentially 
immortal. 

The  Bayesian  estimators  based  on  a  unifoian  prior  distribution  on 
the  unit  interval  are  denoted  p, -,(t)  ,  "ji  (t)  and  a  (t)  ,  that  correspond, 
respectively,  to  the  estimators  p  (t) ,  p„(t)  and  p  (t) .   The  sample 
data  base  thus  gives  the  following  estimates  of  the  survival  function: 

t   Pll^^)  ^^12^^^  ^^13^^) 


0-1   4/5  =  0.8          6/7  =  0.857  6/7  =  0.857 

1-3   3/5  =  0.6  {^)  X  0.857  =  0.714    (|-)  x  0.857  =  0.714 
5                       5 

3-7   2/5  =  0.4   (4)  X  0.714  =  0.536    ,3.5.  0-7-,/,    n  cc^ 

4                    (t— r)  X  0.714  =  0.556 


(7)   1/5    0.2   ^1^  ^  Q^^26  =  0.268    (^^)  x  0.556  =  0.354 


4.5' 
1.7; 
2.75' 
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At  the  time  of  the  final  event  (whether  a  death  or  a  truncation) , 
these  step-function  estimators  drop  to  some  positive  value.   Again, 
we  have  no  data  to  indicate  how  the  survival  function  proceeds  to  zero 
at  subsequent  times, 

D.   THE  JACKKNIFE  ESTIMATOR 

We  will  assxome  that  we  observed,  or  have  generated  in  a  simulation, 
a  survival  probagility  p(t.),  j=l,...,n,  from  various  sample  sizes. 
Furthermore  we  have  some  parameter  or  characteristic  p(t.)  of  the 
sample  size  which  we  wish  to  estimate  with  an  estimator  p(t.).   The 
jackknife  estimator  p(t,n)  described  below  is  an  approximately  unbiased 
estimator  of  p(t.).   A  modification  of  it  has  other  useful  properties. 

p  . {t,n-l)  is  the  estimator  from  the  sample  of  n  of  the  X. 's  with  the 
ith  value  deleted  from  the  sample. 

p. (t,n)  =  n  p(t,n)  -  (n-1)  p_  (t,n-l)      i=l,...,n 

1  ^  -1  ^ 

p(t,n)   =  —  E  p.  (t,n)  =  n  p(t,n) S  p  ^  (t,n-l) 

n  .  -,   1  n   .  T   -J- 

1=1  1=1 

the  p.(t,n),  called  the  PSEUDO-values. 

The  PSEUDO-values  can  be  used  to  obtain  variance  estimates  of  p(t,n) 
and  to  set  approximate  confidence  limits,  using  Student's  t. 

The  idea  is  that  the  PSEUDO-values  will  be  approximately  indepen- 
dently and  normally  distributed.   The  jackknife  estimator  p(t,n)  is  a 

2 
sample  average  so  we  form  an  estimate  S~  of  its  variance  given  by 

p(t,n) 

the  following  relationship  (Miller,  1974) : 

2    E5^^(t,n)  -  i(Ep^(t,n))^ 
~  n-1 

2      ^S^ 

p(t,n)    n 


30 


This  procedure  is  particularly  useful  if  the  number  of  data  points 
is  small,  but  it  must  be  used  with  care.   Note,  that  the  estimator  p(t,n) 
is  designed  to  eliminate  a  —  bias  term  in  the  estimator  p(t,n).   Of 
course  the  computational  aspects  of  the  complete  jackknife  can  be  quite 
onerous,  especially  if  p{n)  were,  say,  a  complicated  maximiom  likelihood 
estimator.   I-liller,  reference  (4)  has  shown  that  the  product  limit 
estimator  is  its  own  jackknife. 
Logistic  Transformation 

Although  one  can  legitimately  jackknife  the  Kaplan-Meier  estimate 
directly,  there  is  some  reason  to  believe  that  a  preliminary  transforma- 
tion will  give  improved  results.   Consequently,  consider  the  transforma- 
tion 

^   ^""^l-pCt)^ 

and  notice  that  where  the  range  of  p(t)  is  from  zero  to  unity,  the  above  : 
transformation  makes  the  range  of  Z   run  from  -<»  to  <».   The  procedure 
utilized  will  be  as  follows. 

(A)  Compute  the  overall  estimate  at  a  time  point  t,  using  all  N  data 
points,  and  using  a  "continuity"  correction  that  has  the  effect 
of  removing  the  effect  of  a  zero  in  the  logarithm  (see  D.R.  Cox, 
Analysis  of  Binary  Data,  Methuen  Monograph) : 

(B)  Compute  the  £-values  by  leaving  out  each  data  point  in  turn  when 
computing  p(t):  for  i=l,2,...,N. 
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(C)  Form  the  pseudo-values 

z  =  n£  -  (N-1)  £   ^   . 

2 

(D)  Compute   z,  S 

(E)  Put  approximate  confidence  (l-a)»100%  limits  on  E[£]  as  follows 

L  <  E[£]  <  H 


where        H{L)  =  z  +(-)  t,    (H-1)  /  — 

1-a        N 


(F)   Transform  bash  to  obtain 


L            H 
e        -    e 
-—  ,  and  - 

1+e^  1+e" 


The  true  value,  p(t),  should  be  enclosed  between  these  levels  for 
roughly  (1-a) '100%  of  all  samples.   The  coverage  properties  of  this  pro- 
cedure will  now  be  checked  by  simulation:  successive  samples  of  size  N 

will  be  selected,  the  jackknife  limits  H  and  L  will  be  computed  for  each, 

L  H 

e  e 

and  a  check  will  be  made  as  to  whether  —  <  p(t)  <  —  or  noto   Tables 

-.   Ij  "^  ~~  -i   H 

1+e  1+e 

illustrating  performance  are  given  subsequently. 
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Let 


^N-1  -i  ^^   ^^^  logistic  transformation  estimator  from  the  sample  n 
of  the  X.'s  with  the  ith  value  deleted  from  the  sample. 


N-1,-1 


^N-l,-i     "^  2(N-1)  ) 


1-p^,  ,   .  (t)  + 


1 


N-l,-i    2(IJ-1) 


N-l,-i 


i 

t 

1 

2 

3 

4 

5 

*i 

3.04 

0.98 

0.98 

0.98 

0.98 

'2 

3.04 

0.98 

0.98 

0.98 

0.98 

S 

0.63 

0 

0.98 

-0.46 

-0.46 

\ 

0.63 

0 

0.98 

-0.46 

-0.46 

*5 

-3.04 

-3.04 

-3.04 

-1.89 

z.  =  N£   -  (N-1)  I      .       . 
X  N  N-1,-1 


Pn^^^  ""  2N  P 

=  IJlni— ^)  -  (N-1)  l^i 

l-PN(t)+  2i 


N-i,-i^^)  +2(M-1)  . 
N\  ~       ,^.  1 


z.  ill)    are  called  PSEUDO-values  of  logistic  transformation,  the 
following  values  are  calculated: 


i 

t 

1 

2 

3 

4 

5 

\ 

-6.05 

2.198 

2.198 

2.198 

2.198 

\ 

-6.05 

2.198 

2.198 

2.198 

2.198 

S 

-1.9 

0.606 

-3.314 

2.446 

2.446 

•^4 

-1.9 

0.606 

-3.314 

2.446 

2.446 

s 

-3.0626 

-3.0626 

-3.0626 

-3.0626 

-7.162 
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Average  of  the  pseudo-values 


N 


2  =  77  ^   z. 
^  i=l   ^ 


Invert  to  find  jackknife  estimator  of  logistic  trans foinnation 


l-p(t)+  f^ 

^     2N 

1.  i   -1- 


(t)  =  ( I-^TIt)  e   -  2n 


Variance  of  the  z • 


1  +  e 


called  the  jackknife  estimator 
of  logistic  transformation 


S    .  =  Var( z) 


1    "" 
— T   ^   z.  -  z 
"-^   i=l   " 


The  following  values  are  calculated: 


t 

Z 

p(t) 

Var  J  ■'< 

•=1 

0.5484 

0.646 

13.6 

^2 

0.5484 

0.646 

13.6 

'3 

0.0568 

0.516 

6.727 

\ 

0.0568 

0.516 

6.727 

•^s 

-3.882 

0 

3.361 

The  jackknife  estimator  for  estimating  variability  and  giving  confidence 
interval. 

Tukey,  reference  C3)  has  suggested  that  in  the  jackknife  procedure 
we  consider  tiie  pseudo  values  z.  (n)  as  approximately  independent  and 
identically  districuted  and  consequently,  since   z   is  an  average  of 
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the  Z  .  (N>,   proceed  as   if 


W  ^  z  -   £. 


%^.^,  '^-^)'}' 


i=l 


has    t-distribution  with  N-1   d.F  . 

If  the   2.    are   approximately  normal  variates    (Miller  has   shown) 
confidence   bands    for  the   unknown  p(t)    are   given,    as    for  the  mean  of 
any  normal  variate  when  estimated  from  sample   size  n. 


z  +  —     t  (N-1) 


(D-l) 


I.e. 


_        s  p  (t)      +  —  s 

z  -  —  t.    ^/-(N-1)    <   £n( ^)    <  z  +  —  t,       /.(N-1) 

r-    l-a/2  —         1    ~, 4.x.     1      —  r     l-a/2 

/n  l-p(t)+  —  /n 


L(n)    =   z   -  —  t^       ,^ 


L  (n )    =   z   +  -=-  t 


^     1-^/2 


—^ 2N_     ^  ^(^^    ^  2N^ 2N 


1  +  e— 


1   +  e 


L(N) 


The   following  values   are   calculated: 


t       ^\ 

4 

t,        ,^   =   2.776 
l-a/2 

Lower   Int. 

Upper   Int. 

\ 

0 

1.0 

\ 

0 

1.0 

s 

0 

1.0 

^4 

0 

1.0 

s 

0 

0.14 
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The  basis  for  this  leap  of  the  imagination  seems  to  be  that  if 
^  ~  ^  ~  \  then  the  procedure  for  obtaining  confidence  intervals 
using  equation  (D-1)  and  pseudo- values  is  the  same  as  the  procedure 
using  jackknife.   Then  if  %  =2     and 


-    1  "" 

z  =  —  2    z .      we  have 

^  i=l    " 


z.  =  Nil  -  (n-1)  I 
1      N  N-l,-i 


iM 
{  S   X.}  -  X. 

j=l   ^      ^ 
=  NX.  -  (N-1)   --L^- 


N  N-1 


N         N 

=  Z  X.  -  [  Z  X.]  +  X.  =  X. 


Thus  the  pseudo  value 


1  " 
z.  =  X.   and    z   =  —  Z   X.  =  X 
1     1  n  .  T   1    n 

1=1 


The  pseudo  values  are  independent  if    z  =  x   and  they  are  normal  if 

X.  is  normal. 

1 

E.   PARAMETRIC  ESTIMATOR,  "p  (t)" 

This  paper  considers  one  additional  estimator,  denoted  p_(t).   It 
is  a  parametric  estimator.   Therefore,  it  is  not  really  a  competitor  to 
the  thirteen  non-parametric  estimators  considered  here.   In  general,  a 
parametric  estimator  would  not  be  used  if  the  functional  form  were  re- 
garded as  unknown.   Similarly,  a  n on -parametric  estimator  would  not 
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normally  be  used  if  the  survival  fxonction  were  strongly  suspected  to 
have  a  specified  form. 

p^(t)  is  the  well  known  maximum  likelihood  estimator  for  the  expo- 
nential distribution: 

P,(t)  =e-^/^ 

Z  t. 
where  T  =         "^ 


number  of  observed  death 


In  our  sample  data  base,  the  total  observed  survival  time  is  19,  and 
three  deaths  are  observed.   Thus, 


Zt.  =1  +  2  +  3  +  6  +  7  =  19 

X 

19 

,  ~  /^N       -3t/19 

and  P^Ct)  =  e 


Calculations  for  selected  times  of  interest  yield  the  following  esti- 
mates : 

p^(0)  =  1.0 
p^(l)  =  0.854 
p^(3)  =  0.623 
p^(7)  =  0.331 

The  thirteen  non-parametric  estimators  are  compared  for  a  variety  of 
generating  distributions  for  both  the  death  mechanism  and  censoring 
mechanism. 
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IV.   INSTRUCTIONS  FOR  USING  PROGRAM 

INPUT 

Each  input  card  bears  nine  variables.   The  distribution  of  time  of 
death  is  entered  in  the  first  set  of  (five)  columns,  the  censoring  dis- 
tribution is  entered  in  the  second  set  of  (ten)  columns,  a  parameter  of 
the  censoring  distribution  is  entered  in  the  third  set  of  (ten)  columns, 
the  number  of  replication  is  entered  in  the  fourth  set  of  (five)  columns, 
the  number  of  the  event  is  entered  in  the  fifth  set  of  (five)  colinnns. 
For  the  purpose  of  all  print  output  used  code  "0"  and  "1"  in  the  sixth 
set  of  (five)  colxomns,  the  seed  number  is  entered  in  the  seventh  set  of 
(five)  columns,  after  the  card  giving  the  time  of  the  last  event  of  a 
data  set,  a  card  with  "0"  or  "1"  in  the  column  50  is  inserted,  i.e.,  the 
"0"  indicating  more  data  sets  to  follow  and  "1"  indicating  the  last  data 
sets  and  t  value  is  entered  in  the  ninth  set  of  (eight)  coliomns. 

The  distribution  of  timeof  death  and  of  censoring  time  used  code  as 
follows : 

Code  Type  of  Distribution 

1  Uniform 

2  Exponential 

3  Delta  function 
OUTPUT 


The  output  lists: 

1)  the  time  of  each  observed  failure 

2)  estimated  survival  probability  at  that  time 

3)  the  variance  of  that  estimator 

4)  result  of  goodness  fit 
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a)  mean  error 

b)  mean  absolute  error  (ABS) 

c)  root- me an- square  error  (RMS) 

5)  total  number  of  observed  death 

6)  confidence  interval  at  particular  time 
Definition  of  Fortran  Variables 


NDIE 

NTRUNC 

XTRUNC 

NREPL 

NEVENT 

NWRITE 

NEND 

TN 


'10 


11 


12 


13 


the  distribution  of  time  of  death 

the  distribution  of  censoring  time 

the  parameter  of  the  distribution  of  censoring  time 

niamber  of  replication 

number  of  event 

write  all  output  or  partial  output  of  simulation 

indicate  more  data  sets  or  last  data  set 

t  statistic  value 

the  estimator,  p  (t) 

the  estimator,  p. (t) 

the  estimator,  p_(t) 

the  estimator,  P. (t) 

the  estimator,  Pc(t) 

the  estimator,  P^ (t) 

parametric  estimator  ,  p_(t) 

jackknife  estimator  of  logistic  transformation  of  p  (t) 

jackknife  estimator  of  logistic  transformation  of  p  (t) 

jackknife  estimator  of  logistic  transformation  of  p  (t) 

6 

Bayesian  estimator  of  p  (t) 
Bayesian  estimator  of  p  (t) 
Bayesian  estimator  of  p^(t) 
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p  :  jackknife  estimator  of  logistic  transformation  of  p  (t) 

SL(I,J)  :  PSEUDO- value 

SBAR  :  average  of  pseudo-value 

Var  :  variance  of  estimator,  p(t) 

Var  :  variance  of  jackknife  estimator 
J 

u(I,J)  :  mean  of  goodness  fit 

w(I,J)  :  absolute  mean  of  goodness  fit 

s(I,J)  :  root  mean  square  error 

C,  :  upper  confidence  interval  of  p, ^  (t) 

1  ^^  14 

C  :  lower  confidence  interval  of  p, .  (t) 

2  14 

C  :  upper  confidence  interval  of  PQ(t) 

C^  :  lower  confidence  interval  of  p  (t) 

4  o 

C  :  upper  confidence  interval  of  P^Ct) 

C  :  lower  confidence  interval  of  p^(t) 

5  y 

C  :  upper  confidence  interval  of  p   (t) 

C  :  lower  confidence  interval  of  p   (t) 
o  xu 
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To  compare  RMS  with  product  limit  (p  (t))  and  jackknife  estimator  of 
logistic  transformation  (p   (t) ) 

1  -i.OOOC  ICCO  20     1   505   :   Input 

1  C.C12  7S  C.S5C00  C.93/26          :   Output 

C  0.01581  0.0  C.COOOG 

1  C.CiCdS  C.bS722  0.88376 

1  C.CSAjC  Cd^^^-V  0,83199 

1  C.11S25  C.7S167  C. 78256 

]  C.lr£7'i  C. 73669  C. 734^5 

0  C.  17t74  CO  0.05254 
C  C.1ES82    CO  C. 00000 

1  C.ISCJA  0.67  /31  0.68724 
1  0.1S816  C. 61574  C. 63201 
1  C. 27670  0.55417  0.57755 
1  C.2eC60    C.4925C  C. 52356 

0  0.31670    O.C  O.OCOOO 

1  C. 46075  0.42222  0.47045 
1  0. 69596  C. 35185  0.40828 
1  C.7C112  0.281^6  0.34624 
1  C. 1 ;e8S  C. 21111  0.28413 
1  1.16657    C.14C74  0.22201 

0  1.25  112    0.0  0.052 54 

1  1.47370    CO  0.05254 

C.20C         0.200  0.400         0.500         C.600         C.7C0         0.800         C.900 

■0.0C2  O.OOC         0.002  -0.000       -0.001      -O.OOl         0.007         0.028         0.066          ^'EA^ 

-0.013  -C.007          0.004  0.015         0.029          0.044          0.J69         0.109          0.166          KEAN 

C.053  C.C71         0.C84  0.090         0.097         0.093         0.091         0.077         0.078          ABS 

0.057  G.C68          0.075  0.080         0.087         0.087          0.094          0.116          0.166          AOS 

O.OtS  C.091         0.105  C.113         0.119         0.115         C.113         0.100         0.102         PfS 

0.071  0.085          0.C94  0.101         0.108         0.109         0.120         0.140         0.185          R^«S 

0.965  C.951          C.901  0.841         0.781          0.722         0.6S1         0.678          0.751          CCNF 

C.767  C.591          0.459  0.355         0.267         0.1o5         0.116         0.057         0.005         CCNF 

91.254       97.065  97.959       93.542       99.125       97.959      98.542       97.668          PER 

lOCO            ICCO  lOOC              998               990              951               796               343 


2  4.000C    lOCO  10  1    1509      :       Input 

C    C. 03688    G.C  C.COGOO  :      Output 

1    0.14510    €.68689  C.90796 

0  C.22J06    0.0  0.00000 

1  0.2i401    C. 7619C  0.78995 

0  0.30447    O.C  C.COOOO 

1  C.'!i6359  0.60952  C. 67485 
1  C. 71228  0.45714  C. 54479 
1  0.7950C  0.3G476  0.42580 
i  1.1^699  0.15238  C. 36025 
1    2,27255    CO  0.06065 

C.ICC  G.20C         CiOC         0.400         0.500         0.600         0.700         0.800         C.900 

kill  -l:l°d  -UVo  -8:S§1    §:82'7    §:?iS    tn\    'o'AU    ^:hi    "- 
§:S?^.    l-A'^t    l:\hl    Ult    liiU    S:HI    1:111    l:\U    §:IU    J'^i 


?:§li    L-ilS    t:m    §:fSi    3:1^3    ?:{t?    °o:\n    §:m    §:;-5?    5fs^ 
l-Ml    l:lll    UU    °:V^    ^:m    ^iltt    tm    S:n\  .l:^t    li^ 

-.2.1,6-.        K.'i33      7E.?33       ^-..'r-       ?5.733      <='.(.00      97.0t7      •■,7.067       S6.?00         FCK 


ICOC  ICCC  ICCO  '•""■• 


98^  '5^'»  9^3  '>^o  -75 
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Computer  output  of  the  fourteen  estimators 


^^2           1 

^.OOOC 

lOCO 

20            0 

505 

C.ICC 

C.20C 

0.200 

0.400 

0.500 

C.600 

0.7CC 

0.800 

C.900 

-0-C33 

-0.058 

-C.07C 

-0.100 

-0.114 

-C.121 

-0.114 

-0.092 

-O.J'*  9 

ME/^^ 

-0.CC2 

0.000 

0.002 

-0.000 

-0.001 

-0.003 

0.C03 

0.018 

C.056 

^'EA^ 

-C.CCl 

C.OOl 

0.003 

0.C02 

0.002 

0.002 

O.CCS 

0.0Z6 

0.067 

Mi/^N 

■     -0.C55 

-0.073 

-0.086 

-0.099 

-0.106 

-0.106 

-U.0<:4 

-0.0  70 

-C.G29 

NgAN 

-C.C21 

-0.015 

-O.OIC 

-0.007 

-0.003 

-0.001 

0.007 

0.017 

C.04C 

N5AN 

.     -C,Q2C 

-0.014 

-o.oce 

-0.C04 

0.001 

0.006 

0.  j18 

0.037 

0.077 

MEAN 

-C.OOA 

-0.007 

-c.ooa 

-0.008 

-0.007 

-C.005 

O.OCC 

O.OIC 

C.033 

^EA^ 

-0.0^5 

-0.069 

-0.085 

-0.099 

-0.107 

-0.109 

-0.100 
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V.   RESULTS  OF  THE  SIMULATION 

This  paragraph  presents  graphical  comparisons  of  the  13  estimators 
based  on  simulated  data.   Each  comparison  is  based  on  1000  replications 
of  a  simulated  data  base.   The  bias  and  RMS  error  (square-root  of  mean- 
squared  error)  of  each  estimator  depends  on  the  parameters  that  control 
the  simulated  data  base.   No  single  estimator  dominates  all  others 
under  all  conditions. 

The  bias  and  RMS  errors  of  the  estimators  depend  on  several  factors: 

(A)  The  sample  size  (NEVENT)  of  individuals  under  observation  at  time 
zero  affects  the  accuracy  of  the  estimators.   In  general,  a  larger  sample 
size  leads  to  a  better  estimate  than  a  smaller  sample.   Values  of  NEVENT 
selected  for  simulation  are  5,  10,  25,  and  50  (plus  one  simulation  with 
NEVENT  =  100) . 

(B)  The  distribution  of  times  at  which  the  observations  are  censored 
(unless  the  individual  dies  earlier)  affects  the  performance  of  the 

various  estimators c   This  distribution  is  particularly  important  in  con- 
junction with  the  distribution  of  lifetimes  (do  most  individuals  die 
before  censoring  is  likely?,  are  deaths  and  censoring  events  about 
equally  likely  at  all  times?,  are  most  observations  censored  before  death?) 
Three  types  of  distributions  are  assumed  to  underlie  the  censoring  mech- 
anism: 

(1)  Some  of  the  samples  are  generated  on  the  assumption  that  no  censor- 
ing occurs. 

(2)  Some  samples  are  generated  from  a  uniform  distribution  of  times  of 
censoring. 
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(3)   Other  data  bases  are  generated  from  an  exponential  distribution  of 
censoring  times. 

(C)   The  distribution  of  lifetimes  (ignoring  the  possibility  of  censor- 
ing) also  affects  the  performance  of  the  various  estimators.   Two  types 
of  distributions  are  assumed  to  underlie  the  death  mechanism: 

(1)  Some  of  the  samples  are  generated  from  a  uniform  distribution  of 
lifetimes. 

(2)  Other  data  bases  are  generated  from  an  exponential  distribution  of 
lifetimes. 

If  a  uniform  distribution  of  lifetimes  is  selected,  its  range  is 
always  over  the  interval  from  time  0  to  time  1.   If  an  exponential  dis- 
tribution is  selected,  it  always  has  a  mean  lifetime  of  1.   The  distri- 
butions of  truncation  times  (uniform  or  exponential)  have  parameters 
.25,  .5,  .667,  .75,  1,  1.333,  1.5,  2  and  4.   A  wide  variety  of  samples 
may  be  simulated  by  mxing  various  pairs  of  distributions  (for  censoring 
times  and  deaths) .   Since  the  time  units  are  arbitrary,  the  restriction 
on  mean  lifetimes  is  irrelevant. 

The  true  value  of  the  survival  function  is,  p(t),  and  the  form  of 
this  function  affects  the  relative  performance  of  the  13  nonparametric 
estimators.   For  example,  the  Bayesian  estimator  p   (t)  tends  to  be 
better  as  measured  by  square-root  of  mean-squared  error  than  its  counter- 
part (the  product-limit  estimator,  p„(t))  for  the  time  frame  in  which 

.3  <  p(t)  <  .9 

However,  the  product  limit  estimator  tends  to  be  better  for  those  times 
when  p(t)  is  close  to  zero  or  unity. 

The  point  estimators,  Pc(t)  and  P^(t)  tend  to  be  better  than  the 
product-limit  estimator  (p  (t) )  for  all  time  periods.   The  jackknife 
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estimators  of  logistic  transformation  (p^Ct),  p„(t),  p   (t) )  of  point 

o      9      10 

estimators  tends  as  same  as  its  counterpart  point  estimators  (p  (t) , 
p^(t),  Pg(t))  for  all  time  periods.   And  also  the  estimator  formed  by 
jackknifing  the  logistic  transformation  (p   (t))  of  the  product  limit 
estimator  tends  to  be  better  than  its  counterpart  product  limit  (p  (t) ) 
for  the  time  frame  in  which 


.1  <  p(t)  <  .7 


However,  the  product  limit  estimator  tends  to  be  better  for  those  times 

when  p(t)  is  close  to  unity.   Point  estimators,  p^ (t)  and  p  (t)  tend  to 

5         6 

be  same  for  the  time  frame  in  which 


0.1  <  p„(t)<  0.9 

However,  the  p^ (t)  tends  to  be  better  for  those  times  when  p  (t)  is 
close  to  unity.   The  jackknife  procedure  may  be  validated,  in  an  empiri- 
cal sense,  by  sampling  experiments  or  computer  simulation  in  the  follow- 
ign  manner.   First,  times  of  censoring  and  death  are  obtained  by  drawing 
random  numbers  from  postulated  distributions.   Second,  the  jackknifed 
estimator  of  the  logistic-transformed  product-limit  estimation  is  found, 
and  confidence  limits  are  computed  by  the  method  of  Tukey,  reference  (3). 
Since  the  true  value  of  survival  function,  p(t),  is  known,  so  is  the 
theoretical  value  of  A.   The  jackknife  confidence  intervals  can  be 
checked  for  coverage:   if  L  ^  A  <_  H  then  the  particular  interval  covers, 
while  otherwise  (if  A  <  L  or  H  <  A)  it  does  not  cover.   Finally,  the 
above  procedure  can  be  repeated  many  times  (say  1000)  and  the  fraction  of 
repetitions  which  contains  the  true  value  of  A  is  recorded.   This  fraction 
of  the  coverage  should  desirably  be  close  to  (1-a) ,  the  nominal  confidence 
level. 
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The  jackknife  confidence  limit  procedure  can  be  said  to  be  robust 
of  validity,  ref  (7) ,  if  the  actual  coverage  is  close  to  the  nominal 
coverage,  1-a,  for  a  various  distributions.   Such  seems  to  be  true  for 
large  n  (n  >_  50) .   However,  the  jackknife  confidence  limits  do  not  cover 
accurately  when  the  true  value  of  p(t)  is  close  to  unity. 

The  following  tables  illustrate  confidence  limits  of  jackknife 
method  of  product  limit  (p  (t) ) .   Many  canputer  generated  graphics  are 
presented  on  the  following  pages  to  complete  this  section. 
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Fig.  1:   Comparison  of  RMS  derived  from  sample  size  5. 

(Step  function  estimators  vs.  point  estimators) 
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Fig.  2:   Comparison  of  root  mean  square  derived  from  sample  size  5, 
(Step  function  estimators  vs.  jackknife  estimators  of 
logistic  transformation) 


l.?^3L-0l 


\.ci  cr-ci 


52 


P2i.)  .VS.  (■:(.)  .'  >.   .-irci  .vs.  ri3(o 


0.^0  ^.^^  '-''^ 


i5'/l  -OJ 


♦    p.-iCir-oi 


1 01  r -01   *•  * 

*  * 
»  * 

*  • 

*■  X             * 

*  .  *        ♦          +    2.1'.^t-01 

*■                                  ♦  «                  * 

l^^^cji   ♦                                          +  # 

5-                                                   ♦  ♦ 

I                        *  •         : 

*  « 

*•  « 

*  -t     l.Ff^.''-Cl 


.;Ji6L-0l   ♦ 


* 


*  X  ;» 

*  •  ♦ 

*  +  *                             * 

*  * 

*  ♦ 


X 


+   i.^.7<;r-oi 


L.62VF-01 


« 


*  * 


.       X       X       ^  : 


+    l.?71f-01 


l.-WJC-ni   t  ♦ 


*    l.llJF-Ol 


0.0 


;.?') 


Fig.  3:   Comparison  of  root  mean  square  derived  from  sample  sizes, 
(Step  function  estimators  vs.  Bayesian  estimators) 
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Fig.  4:   Comparison  of  root  mean  square  derived  from  sample  size  10. 
(Step  function'  estimators  vs.  point  estimators) 
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Fig.  5:   Comparison  of  root  mean  square  derived  from  sample  size  10. 
(Step  function  estimators  vs.  jackknife  estimators  of 
logistic  transformation) 
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Fig.  6:   Comparison  of  RMS  derived  from  sample  size  10. 

(Step  function  estimators  vs.  Bayesian  estimators) 
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Fig.  7 1   Comparison  of  RMS  derived  from  sample  size  25. 
(Step  function  estimators  vs.  point  estimators) 
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Fig.  8:   Comparison  of  RMS  derived  from  sample  size  25. 

(Step  function  estimators  vs.  jackknife  estimators  of 
logistic  transformation) 
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Fig.  9:   Comparison  of  RMS  derived  from  sample  size  25. 

(•Step  function  estimators  vs.  Bayesian  estimators) 
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APPENDIX  A 
ESTIMATORS  FOR  GROUPED  DATA 

Uith  grouped  censored   data  the  definition  of  p(t./t.   )  given 
by  equation  (5)  does  not  hold  unless  the  assumption  is  made  that  all 
trTincations  occur  at  the  end  of  the  time  interval.   If,  on  the  other 
hand,  it  is  assumed  that  all  truncations  occur  at  the  beginning  of 
At.  the  equivalent  form  of  equation  (5)  is 

N .  -  a  .  -  r . 

p< Wi^  =   \.  -\.  (G-1) 

1    1 

Witli  N.  elements  were  presents  at  beginning  of  interval,  i.e.,  at 

time  t.  ,,  r.  elments  failed  during  the  interval,  and  a.  elements 
1-1   1  1 

truncated  from  the  sample  during  the  interval  but  prior  to  failing. 
As  a  hypothesis,  assume  that  all  aborts  occur  simultaneously  somewhere 
within  the  tine  interval,  so  that  r'  failures  occur  prior  to  the 
truncations  and  time  remaining  r.  -  r'  after  the  truncations.   Then 
N.  -r'   N.  -a.  -r. 

p<  VVi'  =  ^ir—  •  ^^^^  '=-2* 

1      111 
Thus,  the  value  of  p(t./t.  ,)  depends  on  when  the  truncations  occur. 
It  is  assumed  that  this  is  not  known  for  the  grouped  data  case.   Never- 
theless, it  is  possible  to  place  limits  on  the  value  of  p(t./t._  ) 
since  equation  (G-2)  always  gives  values  between  those  of  equation  (5) 
and  (G-1) .   Thus 


N.  -  a.  -  r.  N.  -  r. 

— ^<p(t./t.  J  <  -i- (G-3) 

N.  -  a.     —  ^      1   1-1  —   N. 
11  1 
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For  average  sample  size  approximation,  a  simpler  expression  from  the 

point  of  view  of  computational  ease  may  be  derived  by  substituting 

a/2  for  a  in  equation  (G-1)  giving 

tl  -  I  -  r 

p(t  )  = (G-4) 

II  -  - 
2 

The  equation  (G-4)  may  be  thought  of  as  the  result  of  assuming  that 

the  average  number  of  elements  in  the  time  interval  is  the  number  at 

the  beginning  decreased  by  half  the  number  of  truncations. 

Records  are  usually  available  to  provide  a  fairly  precise  time  the 
deatii  events.   In  the  medical  example,  the  exact  time  of  death  is 
usually  recorded  in  medical  records  required  by  law.   In  the  equipment 
lifetesting  example,  the  time  of  malfunction  or  failure  is  usually 
known  very  precisely  if  tlie  results  are  catastrophic;  and  maintenance 
records  give  a  reasonably  precise  time  even  if  the  failure  is  not 
critical  to  a  larger  system.   In  the  military  example,  the  event  of 
interest  is  usually  a  sensor  detection  or  some  other  action  that  is 
routinely  recorded  in  a  log  book. 

Equaiton  (G-4)  is  a  modification  to  the  product-limit  estimator, 
p  ,  when  the  times  of  truncation  are  known  only  in  grouped  form.   Herd, 
reference  (2),  suggests  a  similar  modification  to  estimators  using  the 
second  approach  (p  or  p  )  with  aggregated  truncation  data.   Illustrate 
results  for  this  method  based  on  the  sample  data  base  of  the  main  test 
are  given  below.   Here,  of  course,  we  do  not  know  that  individual  B 
dropped  out  of  observation  at  time  2  and  that  individual  D  dropped  out 
at  time  6.   We  know  only  that  the  two  truncations  occurred  in  the 
interval  (1,3)  and  (3,7),  respectively. 
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Product  limit's  modification  is  denoted  by  p„'(t)  and  Herd's  modi- 
fication is  denoted  by  p  ' (t) . 

Their  results  on  the  sample  data  base  are  as  follows. 


0-1  5/5  =  1.0 

1-3  4/5  X  1.0   =0.8 

3-7  2.5/3.5  X  0.8    =  0.571 

(7)  0.5/1.5  X  0.571  =  0.190 


P5'(t) 


0  1.0 

1  5/6  X  1.0  =  0.833 
3  3.5/4.5  X  0.833  =  0.548 
7  1.5/2.5   X   0.648  =   0.389 
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APPENDIX  B 


LISTING  OF  COMPUTER  PROGRAM 


c 
c 


c 
c 
c 


TRLKCATED    C/^TA    PRCGRAM 


D 
-0 

*U 

T«P 

*v 

*) 

c 


IMENS  I 
),P3(1 
1000)  , 
(14,9) 
) tSLZ( 
J4(50, 
L4(50, 
50) ,RM 
NT3(50 
ARJ2(5 
»UP3(5 
/iLL    CV 


ON  NN(9) ,D 
000)  ,P4(10 
PJA6(1000) 
,W(14,9),P 
50,5C)»SL3 
50) ,PJ5( 50 
50) tSBAR4( 
S3(50) »RMS 
)  ,RINT1( 50 
0) ,VARJ4(5 
0)  ,UP4( 50) 
FLOW 


tT(l 
,P5(1 


(14) 

00) 

tTJdOOO 

11(1000) 

(50,50) , 

,50)  ,PJ6 

50) ,PJA2 

4(50) ,UI 

),RINT2( 

0),VARJ5 

,R01(50) 


00  0  ) ,  I 
000  ),P 
) , ITJ( 
,PU(1 
S3AR1 ( 
(50,50 
(IJOO) 
NT1(50 
50) ,RI 
(50), V 
,R02{5 


T(l 

6(i 

100 

occ 

50) 
)  ,F 
,C( 
)  ,L 
NT3 
AHJ 
0)  , 


COO)  , 
000), 
0)  ,PZ 
),Pi3 
,S3AR 
J2(50 
1^,9) 
INT2  ( 
(50)  , 
6(50) 
RG3(5 


Pl(  IC 
PJA4( 
(  lOOC 
(  lOOC 
2  (50  ) 
,50)  , 
,RMS1 
50)  ,U 
RINTA 
,UPi  ( 
0)  ,RC 


(ICC 
PJA5 
,9), 


:c),P2 

1000)  , 

),S(14 

),SL1( 

,SeAR3(50) 

PZ2(1G 

(50), 


|VS2( 

0),U 

F(8) 

5C)  ,UP2(5C 


IMt(5( 
(50) ,C 


4(50) 


4 

5 

t 

7 

8 

9 

IC 

il 

12 

13 

14 


FC 
FC 
FC 

FC 
FC 
FC 
FC 
FC 
FC 
FC 
FC 
FC 
FC 
FO 


15  FG 

16  FC 
18  FC 


RMAT 

RMAT( 

RMAT( 

RMAT{ 

RMAT( 

RMAT( 

RMAT 

RMAT 

RMAT 

RMAT( 

RMAT( 

PMATC 

RM/iT 

RMAT 

RMAT 

RMAT 

RMAT 


(15 

IX, 

iX, 

IX, 

IX, 

IX, 

(3X 

(IX 

(IX 
I  I 

•1' 

IX, 
(IX 
(IX 
(6X 
(IX 
(IX 


I10,F10.4,5I5,F3.3) 

NDIE  ERROR') 

NTRUNC  ERROR' ) 

NREPL  ERROR' ) 

NEVENT  ERROR' ) 

KURITE  ERROR') 

9F8.3) 

2I5,14F8.5) 

•P' ,1 2,9F8.3,3X, 'MEAN' ) 


5,10F1C.5) 

2I5,F10.4,4I5) 
'P' ,I2,9F8.3,3X, 'ABS' ) 
9(14, 4X) ) 

'P'  ,I2,9F6.3,3X,  'RMS'  ) 
215) 


READ    INPUTS    AND    SET     INITIAL    VALUES 

A  =  "^    01 
"5    READ(5, l)NDIE,NTRUNC,XTkUNC,NREFL,NEVENT,NWFlTE,  I  SEED, 
*NEND,TN 

WPITE(6,i3)NDIF,NTRUNC,XTRJNC,NREFL,NEVENT,NWRITE, I  SEE 

"write  (6,10) 
NP=14 

£RH  =  SQRT(  .5  ) 
DC  26  1=1,1000 
T( I )=0. C 
Pl( I)=0.0 
P2(  I)=0.0 
P3( I )=0.0 
P4(I)=0.0 
P5(I )=0.0 
P6(I)=0.0 
TJ(  I)=0.0 
P  iK  I  )=0.0 
P 1  2  ( I  )  =  0  . 0 
P13(I)=0.0 
26  PZ( I)=0.0 


TEST  INPUTS 


GOTO  105 

GOTO  105 
1 )  GO  TO  103 
3)  GOTO  103 
)  GO  TO  104 


IP  (NDIE.LT.l) 

IF  (NDIE.GT.2) 

IF  (NTRUNC. LT. 

IF  (NTRUNC. GT, 

IF  (NREPL. LT.l 

IF  (NREPL. GT.1000)GOTO    104 

IF(NEVENT.LT.2)     GOTO    102 

IF  (NEVENT. LE. 1000)    GOTO    200 


EPRCR    MESSAGES 
1C2    WRITE    (6,5) 
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c 
c 

c 


c 
c 
c 


c 
c 
c 


c 
c 
c 


STCP 
103    ^^PITE    (  6,3) 

STCP 
iC-^    WRITE    (  6,^} 

STCP 
ICS    ^nPITE    (6,2) 

ice  sicp 

START    MAIN    CALCULATICN 

2CC    CC    250    J=l,? 
NN( J)=0 
CC    250     1=1, NP 
S(I,J)=0 
U( I ,J)=C 
C(I,J)=3.0 
V^(  I,J)=0 

DC    4999    IREFL=1 ,NREPL 
NCI=0 
DC    999    IEVENT=1 ,NEVENT 

CREATE    TTRUNC( ) 

CALL    RANDOM( ISEED,TTR,1) 

GCTO    (300, 350,400), NTRUNC 

GCTQ    103 

TTR=TTR*XTRLNC 

GCTQ    5D0 

TTR=-XTRUNC«ALOG(TTR) 

GCTO    500 

TTR  =XTRUNC 


25C 


3CC 
35C 


5CC 
700 

8CC 

81C 
a5C 

87C 


sec 

89C 
9CC 


CREATE    TDIEO 

CALL    RANDaM(ISEED,TDI,l) 
GCTC    (800,700) ,NDIE 
GCTO    102 
TCI=-ALOG(TCI) 

DETERMINE    SMALLER    CF    TDIEO     ANC    TTFUNCI) 

IF     (TDI.LE.TTR)     GCTQ    810 

TT=TTR 

ITT=0 

GCTQ    850 

TT=TDI 

NCI  =  NDI+1 

ITT=1 

T( I£VENT)=TT 

IT{IEVrNT)=ITT 

CRCER    DATA 

I F  ( lEVENT.EQ.l )  GCTO  999 

II=IEVENT-1 

CC  890  1  =  1,  II 

IF  (TT.GT.T(I) )  GOTO  890 

I  11=11-1  +  1 

CC  680  J  =  l  ,  III 

JJ=IEVENT-J 

IT( JJ+1 J=IT(JJ) 

T( JJ+1 )=T( JJ) 

IT( I J=ITT 

T(I  )  =  TT 

GCTQ  999 

CONTINUE 

CCNTINUE 

T^=T( NEVENT  ) 

T7  =  0 

DII=NDI+2. 

IF  (NDI .GT.O)  GOTO  8122 
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8121 


C 
C 

C 


8x22 


215' 


219  9 


22CC 
221C 

2221 


2224 


C 

c 
c 


2225 


P4(NEVENTJ=SRH 

P5(NEVENT)=SRH 

TT=0 

DC  3121  J=ltNEVENT 

TT=TT+T( J) 

DK=TT/T(NEVENT) 

P6{NEV£NT)=SQRT(DN/( CN+i.) ) 

GCTC  lill 

CALCULATE  Pl()  AND  P4()  AND  PIK)  VECTORS  ANC  P7  DATA 


J=0 

11  = 

I  II 

DC 

T7  = 

IF( 

J  =  l 

GCT 

Pl( 

P4( 

Pll 

TTT 

III 

I  1  = 

J  =  0 

N  =  N 

CCN 

T7  = 

IF 

Til 

CTI 

IF 

TAU 

TTA 

GCT 

T^L 

TTA 

DT  = 

IF 

IF( 

Pll 

P4( 

IF( 

DC 

F2( 

P3( 

P12 

P13 

P5( 

P6( 

GCT 


DI 


0 

=  0 

2199    I=1,NEVENT 

T7+T(I ) 

IT( I  ).EQ.l)    GGTC    2150 

0    2199 

I)=FL0AT(N-1) /FLaAT(NDI) 

I)=PL0AT(N)/FL0AT(NDI+1) 

{n  =  FLCAT(N)/DII 

=T(I  ) 

=  11 

I 


-1 

TIN 
-ND 
(J. 
=  T( 

=  T( 
(NO 
=-T 
U=- 
C  2 
=  D 
U=( 
TN- 
(DT 
DT. 
(NE 
NEV 
NCI 
222 
I)  = 
I)  = 
(  I) 
(  I) 

n  = 
i)= 

0    1 


UE 

I  /T7 
EG.O 
I  I) 
III) 
I  .GT 
I  I/A 
Til/ 
210 
TI/A 
T{  II 
TII 
.  GT, 
GT.l 
VENT 
ENT) 
.NE. 
4  1  = 
Pl(  I 
Pl(  I 
=  P11 
=  P11 
P4(  I 
P4(  I 
111 


)  GOTO  2221 

-TII 

.1)  GOTC2200 
LCG( P4{ II) ) 
AL0G(P11  (ID) 

LCG(P4(I  I)  /P^dll)  ) 

I) -TII )/AL0G(Pil( II )/Pil(III) ) 


50-1'TAU)     TAU  =  DT/i50 
0=i=TTAU)TTAU  =  OT/150. 
=  Pli  (  I  I  )*EXP(-DT/TTAU) 
P4(  I  I  )'^EXP(-DT/TAU) 
EVENT  )     GO    TO    2225 
tNEVENT 


I) 
I) 


CALCULATE  P2()  ANC  P5()  AND  P12()  VECTORS 


PF  = 
N  =  N 
PFF 
P  =  l 
J  =  G 
DC 
IF 
J  =  l 
GCT 
PP  = 
P  =  P 
PFP 
P2( 
F5( 
P12 
J  =  C 


1. 

EVEiMT 

=FLaAT(N+l) /FLCAT(N+2) 


2399    I=1tNEVENT 
( IT(I ) .EQ.l )     GOTO    2350 

0    2399 

PP*FL0AT(N-1) /FLOAT (N) 

*FL0AT(N)/FL0AT(N+1) 

=PPP*FLCAT( N)/ FLOAT (N+1) 

I)=PP 

I)  =  P 

(I)=PPP 
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2399 


i4CC 
241C 


C 

C 

c 


2425 


255i 


2599 


30C5 
30iC 


1997 
1998 

C 

C    SET 
C 
11  ii 


lOil 


N  =  N-1 

IF    (J.EG.O)    GOTC2425 

IF    (NDI.GT.l)    GOTO    2400 

T/iL=-TI  I/ALCG(  P5  {  II)  ) 

TTAU=-TII/AL0G(P12(II  )  ) 

GCTC2410 

TAlj  =  uTI  /ALQG(P5(  I  I  )/P5(II  Ii  J 

TTAU=DTI/ALCG(P12{II)/P12(III)) 

IF    (DT.GT,150*TAU)    TAU=DT/150 

P5(NEVENT)=F>^EXP(-DT/TAUi 

IF(0T.GT.15C^TTAU)TTAU=DT/i50. 

P12{NEVENT)=FPP=^-EXP(-CT/TTAU) 

C/^LCULATE  P3()  ANC  P6()  ANO  P13()  VECTORS 


PF  =  1, 
N  =  NEV 
P  =  l. 
DPP=F 
J  =  C 
TT=0 
TTT  =  0 
DC  25 
IF  (  I 
J  =  J  +  1 
TT  =  TT 
GCTC 
ON  =  N 
IF  (T 
PF  =  PP 
P  =  F*D 
FFP  =  P 
P2(I) 

pe(  n 
pi3n 
j  =  c 

TT  =  0 

TTT=T 

N  =  N-1 

IF(J. 

CTT=T 

IF(CT 

CN=.5 

GCTC 

DN  =  TT 

P6(NE 

IFIND 

TTAU  = 

GCTQ 

TTAL= 

IF{DT 

P13(N 


ENT 

LQAT(N+1) /FLCAT(N+2) 


99     I=1,NEVENT 

T{  I  ).EC.l )    GCTQ    2550 

+  T(  n-TTT 
2599 

T.NE.3)     DN=DN+TT/(T(I)-TTT) 

*( DN-i  )/DN 

N/ (DN+1) 

PP*DiNI/  (DN  +  1  .  ) 

=  PP 

=  P 

)  =  PPP 


(I  ) 

EQ.O) GC    TO    1111 

N-TTT 

T.GE.1E-70)GCTC    3005 

*{ J+1) 

3010 

/DTT 

VENT)=P*SQRT(CN/ (DN+1. ) ) 

I.GT.i  )G0    TO    1997 

-TII/AL0G(P13(II  )) 

1998 

DTI/ALCG(P13 (II )/P13(I  II)  ) 

.GT.15C*TTAU)TTAU=0T/1 50. 

EVENT)=PPP*EXP(-CT/TTAU) 


UP    A    LOOP    FCR    ALL    JACKKNIFE    CALLS     ( P J4, P J5  ,  F J6 ) 


CC  1000 
DC  1011 
PJ2(I, J 
PJ4(  I 
Pj5{I 
PJ6(I 
SLKI 
SL2(  I 
SL3(I 
SL4(  I 


CCKTINUE 
PJA2(I)=0.0 
PJ)34(I  )=0.0 
P^A5(I) =0.0 
PJA6(I 1=0.0 
SEARKI  )=0.C 
SSAR21I )=0.C 


1=1 tNEVENT 
J=1,NEVENT 

)  =  0.C 

)  =  0.C 

)  =  0.C 

)  =  0.C 

)=0.C 

)=0.C 

)=0.C 

)  =  0.C 
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SE^R3(I )=0.C 
SeAR4( I )=0.C 

PZ2{ i)=:.o 

V4RJ2( I  )=0.C 

VARJ4( I  )=0.C 

VARJSd  J  =  O.C 

VARJ6( I  )=0.C 

CONTINUE 

DC  1001  I=1»NEVENT 


lOCC 

MOVES  DATA  INTC  TJ()  AND  ITJO  VECTCFS 


10C2 


70Ci 
70C2 


70C2 
1003 


40C2 

lOC^ 
lOlC 


K  =  l 

JNEXT 

JEEFC 

JAFTE 

IF(K. 

TJ(K) 

I  7J(K 

IF{  IT 

JNEXT 

JBEFC 

K  =  K  +  1 

GC  TO 

JAFTE 

JEEFC 

GC  TO 

IF{  I. 

IF(K. 

TJ(K) 

ITJ(K 

IF(IT 

IF(  JA 

K=K+1 

GC  TO 

IF(  JA 

NOIJ  = 


=  0 

R=0 
R  =  0 
EQ.I 
=  T(K 
)  =  IT 
(K). 
=  JBE 
R  =  K 


)GC  TO  7003 

) 

(K) 

£Q.O)GO  TC  7001 
FCP 


1002 
R=JBEFCR 
R=JNEXT 

1010 
GT.JEVENT)  GO  TC  7002 
GT.JEVENTJGO  TO  1004 
=T(K+1 ) 
)=IT(K+1) 

J(K).Ee.O)GO  TO  4002 
FTER.Et.O) JAFTER=K 

1003 
FTER.EC.O) JAFTER=JEVENT 
NDI-IT(I) 


C    CHECK  IF  ZERC  DEATHS 
C 

IF(NCIJ.EQ.0)GOT0    1001 

N=NDIJ 

P=l. 

J  =  Q 

11=0 

I  11  =  0 

GC    TC    1014 
C 

C       CALCLLATE    PJ4()VECT0RS 
C 
1014 


1015 


1016 


1017 
1016 


lOlS 


DC 

IF( 

J  =  l 

GC 

P=F 

PZ( 

I  II 

I  1  = 

J  =  0 

N=N 

CCN 

IF( 

Til 

CTI 

IF( 

TAU 

GC 

TAL 

DT  = 

IF( 

PZ( 

K  =  l 


lOlo    IJ=1,JEVENT 

ITJ(  IJ).EQ.i)GG    TO    1015 

TO    1016 

L0AT(N)/FL0AT(NCIJ+1 ) 

IJ)=P 

=  11 

IJ 

-1 

TINUE 

J.EQ.O)GC    TO    1019 

=  TJ(  II  J 

=  TJ(  III)-TII 

NDlJ.GT.liGO    TC    1017 

=-TI  I/ALCG( PZ(II)  ) 

TO    1018 

=  DTI/ALOG(PZ( I  I  )/PZ( II m 

TJ( JEVENT)-TII 

DT.GT.15  0.-TAU  )TAU  =  DT/150 

JEVENT)=F*EXP(-DT/TAU} 
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x02C 

20  21 
5021 


7025 

1022 


7026 

1025 


3027 


302£ 


IFiK.EQ 
PJ^(Kt I 
K  =  K  +  1 
GC  TO  1 
IF(K.EQ 
IF(IT(I 
TJA=TJ( 
IF(JBEF 
FX=ALOG 
PZZ=C.0 
IF  (PX. 
PZZ=EXP 
PJ4( I ,1 
GC  TO  1 
T^B=TJ( 
DTJ=TJA 
CTE=T(  I 
PX  =  DTB'^ 
P2Z=0.0 
IF  (PX. 
PZZ=PZ( 
PJ4(I,I 
IFCK.GT 
PJA(K+1 
K  =  K  +  1 
GC  TO  1 
IFCNDI. 
CC  3028 
CC  3028 
PJ5(K,L 
PJC{K,L 
GCTO  10 


.DGC  TO  2021 

i=ALCG(( PZ(KJ+A)/(1-PZ(K)+A)) 

C20 

.NEVEMT)GG  TO  5021 

).EQ.0)G0  TC  1025 

JAFTER) 

CR.NE.0)G0  TO  1022 

(PZ( JAFTER)  )*T(I)/TJA 

LT.-150)  GCTO  7025 

(PX) 

)=ALCG( ( PZZ+A)/(1-PZZ+A) ) 

025 

JBEFCR) 

-TJ6 

)-Tje 

ALOG(PZ( J AFTER  )/PZ( JBEFCR)  )/DTj 

LT.-150)  GCTO  7026 

JBEFCR)*EXP(PX) 

)  =  ALCG( (PZZ  +  A)/(1-PZZ  +  A)  ) 

.JEVENT)GOTC  3027 

,1 )=ALGG( ( FZ(K)+A)/ (1-PZ(K)+AJ ) 

025 

NE.NEVEiMT)GCTC  1026 

K=1,NEVENT 

L=i  ,NEVENT 
)=PJ4(KtL) 
)=PJ4(K,L) 
01 


C   C^LCLLATE  PJ5()  VECTORS 
C 
1026 


1027 


102£ 


1Q29 
103C 

1031 
1032 


2033 
5033 

7027 
1135 


N=JEVEN 

P=l. 

PF  =  1. 

J  =  C 

DC  1028 

IF(  ITJ( 

J  =  l 

GC  TO  i 

P=P*FLG 

FF=PP*F 

PZ(IJ)= 

PZ2(IJ) 

J=0 

N=N-1 

IF(J.EQ 

IF(NDIJ 

TAl=-TI 

GC  TO  1 

TAL=DTI 

IF(DT.G 

PZ( JEVE 

K  =  l 

IF(K.EQ 

PJ5(K,I 

PJ2(K,I 

K  =  K  +  1 

GC  TO  1 

IF(K.EQ 

IF(IT(I 

IF( JBEF 

PX=T( I ) 

P2Z=0.0 

IF  (PX. 

PZZ=EXP 

PJ5(I »I 

GC  TO  i 

PX  =  DTB=!« 


I J=1,JEVENT 
IJ).EQ.  DGC  TO 


1027 


At(N)/FLOAT(N+l) 
LGAT(N-1  )/FLO/iT(N) 
P 
=  PP 


.0)GC  TO  1C31 

.GT.l  )Ga  TC  1029 

I/ALCG( PZ(  II)  ) 

030 

/ALCG(PZ(I  I  )/FZ(  III)  ) 

T.150*TAU)TAU=DT/150. 

NT)=F*EXP(-CT/TAU) 

.1 )GC  TO  2033 

)=ALCG((PZ(K)+A)/(1-PZ(K)+A)) 
)=ALCG(  (PZ2(K)+A)/{l-PZ2(i<)+A)) 

032 

.NEVENT)GO  TO  5033 
J.EQ.O)GG  TO  1136 
CR.NE.0)GO  TO  1135 
*ALGG(PZ( JAFTER) )/TjA 

LT.-150)  GCTO  7027 

(PX) 

)=ALCG( (PZZ+A)/(1-PZZ+A)) 

136 

ALGG(PZ( JAFTER  )/PZ( JBEFCR) )/DTj 


68 


c 

c 
c 


PZZ=0.0 

IF  (PX.LT.-150)  GOTO  70Z8 

Pil  =  Pl(  JEEFCR)'!'EXP(PX  ) 
70  2  £  PJSd  ,1  )=ALCG(  (PZZ  +  A}/(l-P^Z  +  An 
1136  IF(K.GT.JEVENT )G0  TO  1036 

PJ5(K  +  1  ,1  )=ALOG(  (PZ(K)+A)/1 1-PZ(K)+A) ) 

PJ2(K+i,I }=ALOG( (PZ2(KJ+A)/( 1-PZ2{K)+A) ) 

K  =  K  +  1 

GC  TO  1136 

CALCLLATE  PJ6( )  VECTORS 


1036 


1037 


1036 


1G3 
104 
10  4 

104 


204 
504 

702 
1J4. 


703 
114 


lOCl 


N 
P 
J 
T 
T 
D 
I 
J 
T 
G 
C 
I 
P 
P 
J 
T 
T 
N 
I 
C 
I 
D 
G 
0 
P 
K 
I 
P 
K 
G 
I 
I 
I 
P 
P 
I 
P 
P 
G 
P 
P 
I 
P 
P 
I 
P 
K 
G 
C 
D 
D 
I 
S 
=^A 
<; 

*A 
5 

S 

--!=A 


=JEVEN 
=  1. 
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